Cross-domain Feature Selection for Language Identification
نویسندگان
چکیده
We show that transductive (cross-domain) learning is an important consideration in building a general-purpose language identification system, and develop a feature selection method that generalizes across domains. Our results demonstrate that our method provides improvements in transductive transfer learning for language identification. We provide an implementation of the method and show that our system is faster than popular standalone language identification systems, while maintaining competitive accuracy.
منابع مشابه
Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملA New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملA General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram
Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...
متن کاملGenetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification
This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an i...
متن کاملFeature Selection Evaluation for Light Human Motion Identification in Frailty Monitoring System
In order to plan and deliver health care in a world with increasing number of older people, human motion monitoring is a must in their surveillance, since the related information is crucial for understanding their physical status. In this article, we focus on the physiological function and motor performance thus we present a light human motion identification scheme together with preliminary eva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011